104 research outputs found
Semantic Structure based Query Graph Prediction for Question Answering over Knowledge Graph
Building query graphs from questions is an important step in complex question answering over knowledge graph (Complex KGQA). In general, a question can be correctly answered if its query graph is built correctly and the right answer is then retrieved by issuing the query graph against the KG. Therefore, this paper focuses on query graph generation from natural language questions. Existing approaches for query graph generation ignore the semantic structure of a question, resulting in a large number of noisy query graph candidates that undermine prediction accuracies. In this paper, we define six semantic structures from common questions in KGQA and develop a novel Structure-BERT to predict the semantic structure of a question, and then rank the remaining candidates with a BERT-based ranking model. Extensive experiments on two popular benchmarks MetaQA and WebQuestionsSP demonstrate the effectiveness of our method as compared to state-of-the-arts
A Condensed Transition Graph Framework for Zero-shot Link Prediction with Large Language Models
Zero-shot link prediction (ZSLP) on knowledge graphs aims at automatically
identifying relations between given entities. Existing methods primarily employ
auxiliary information to predict tail entity given head entity and its
relation, yet face challenges due to the occasional unavailability of such
detailed information and the inherent simplicity of predicting tail entities
based on semantic similarities. Even though Large Language Models (LLMs) offer
a promising solution to predict unobserved relations between the head and tail
entity in a zero-shot manner, their performance is still restricted due to the
inability to leverage all the (exponentially many) paths' information between
two entities, which are critical in collectively indicating their relation
types. To address this, in this work, we introduce a Condensed Transition Graph
Framework for Zero-Shot Link Prediction (CTLP), which encodes all the paths'
information in linear time complexity to predict unseen relations between
entities, attaining both efficiency and information preservation. Specifically,
we design a condensed transition graph encoder with theoretical guarantees on
its coverage, expressiveness, and efficiency. It is learned by a transition
graph contrastive learning strategy. Subsequently, we design a soft instruction
tuning to learn and map the all-path embedding to the input of LLMs.
Experimental results show that our proposed CTLP method achieves
state-of-the-art performance on three standard ZSLP dataset
TemPL: A Novel Deep Learning Model for Zero-Shot Prediction of Protein Stability and Activity Based on Temperature-Guided Language Modeling
We introduce TemPL, a novel deep learning approach for zero-shot prediction
of protein stability and activity, harnessing temperature-guided language
modeling. By assembling an extensive dataset of ten million sequence-host
bacterial strain optimal growth temperatures (OGTs) and {\Delta}Tm data for
point mutations under consistent experimental conditions, we effectively
compared TemPL with state-of-the-art models. Notably, TemPL demonstrated
superior performance in predicting protein stability. An ablation study was
conducted to elucidate the influence of OGT prediction and language modeling
modules on TemPL's performance, revealing the importance of integrating both
components. Consequently, TemPL offers considerable promise for protein
engineering applications, facilitating the design of mutation sequences with
enhanced stability and activit
PeTailor: Improving Large Language Model by Tailored Chunk Scorer in Biomedical Triple Extraction
The automatic extraction of biomedical entities and their interaction from
unstructured data remains a challenging task due to the limited availability of
expert-labeled standard datasets. In this paper, we introduce PETAI-LOR, a
retrieval-based language framework that is augmented by tailored chunk scorer.
Unlike previous retrieval-augmented language models (LM) that retrieve relevant
documents by calculating the similarity between the input sentence and the
candidate document set, PETAILOR segments the sentence into chunks and
retrieves the relevant chunk from our pre-computed chunk-based relational
key-value memory. Moreover, in order to comprehend the specific requirements of
the LM, PETAI-LOR adapt the tailored chunk scorer to the LM. We also introduce
GM-CIHT, an expert annotated biomedical triple extraction dataset with more
relation types. This dataset is centered on the non-drug treatment and general
biomedical domain. Additionally, we investigate the efficacy of triple
extraction models trained on general domains when applied to the biomedical
domain. Our experiments reveal that PETAI-LOR achieves state-of-the-art
performance on GM-CIHTComment: this is the first preprint versio
- …